Joint Germline subworkflow haplotypecaller -> Vqsr#595
Joint Germline subworkflow haplotypecaller -> Vqsr#595FriederikeHanssen merged 90 commits intonf-core:devfrom
Conversation
|
Duplicate of #546, but more advanced, so taking over |
| chr_dir = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Sequence/Chromosomes" | ||
| dbsnp = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/GATKBundle/dbsnp_138.b37.vcf" | ||
| dbsnp_tbi = "${params.igenomes_base}/Homo_sapiens/GATK/GRCh37/Annotation/GATKBundle/dbsnp_138.b37.vcf.idx" | ||
| dbsnp_vqsr = 'dbsnp,known=false,training=true,truth=false,prior=2 dbsnp_138.b37.vcf' |
There was a problem hiding this comment.
I'd prefer having args in the modules.config, and avoiding adding extra files in igenomes.config
There was a problem hiding this comment.
that doesn't fit with the nf-core/module styling as this is expected to be an inputted value
| // resources for GATK joint germline variant recalibration | ||
| RESOURCE_SNP = [ | ||
| [ res_1000g, dbsnp ], | ||
| [ res_1000g, dbsnp_tbi ], | ||
| [ res_1000g_vqsr, dbsnp_vqsr ] | ||
| ] | ||
| resource_INDEL = [ | ||
| [ known_indels, dbsnp ], | ||
| [ known_indels_tbi, dbsnp_tbi ], | ||
| [ known_indels_mills_vqsr, known_indels_1000g_vqsr, dbsnp_vqsr ] | ||
| ] |
There was a problem hiding this comment.
I like that, but I feel like it should be done in the sarek script or in the joint germline variant calling workflow instead
There was a problem hiding this comment.
I would then have to use less descriptive names as for hg19 and hg38 the files are slightly different. So the naming convention has to match regardless of the genome
There was a problem hiding this comment.
what about something like known_snps (dbsnp should stay separate because tools like haplotypecaller explicetly want that file)
| meta, gvcf, tbi -> | ||
| interval_name = meta.num_intervals > 1 ? (gvcf.simpleName - "${meta.id}_").replaceFirst("_",":") : meta.id | ||
| new_meta = [id: "joint_germline", interval_name: interval_name, num_intervals: meta.num_intervals] | ||
| interval_name = meta.num_intervals > 1 ? (gvcf.simpleName - "${meta.id}_").replaceFirst("_",":") : file(params.intervals).simpleName |
There was a problem hiding this comment.
I would be very careful here. I am afraid this may lead to the weird resume errors/unmatching meta mpas we had before.
There was a problem hiding this comment.
hmm, would it be smarter to keep meta the same, and group py another (temporary) key?
There was a problem hiding this comment.
the problem is more the reqirval of the file name. I still can't explain this, but sometimes, when retrieving something from the file name like here, the name is incorrectly resolved later on (even though the matched file in the channel element is the correct one). In this case here this would lead to some very wrong results. (in the rest of sarek, since we group on patient ID it only lead to file name clashes, a) easy to find the bug, b) the actual output results were not impacted)
| ext.when = { params.tools && params.tools.split(',').contains('haplotypecaller') && params.joint_germline} | ||
| withName: 'GATK4_GENOMICSDBIMPORT' { | ||
| ext.prefix = { meta.num_intervals > 1 ? meta.intervals_name : "joint_interval" } | ||
| ext.when = { params.tools && params.tools.split(',').contains('haplotypecaller') && params.joint_germline && !params.no_intervals} |
There was a problem hiding this comment.
| ext.when = { params.tools && params.tools.split(',').contains('haplotypecaller') && params.joint_germline && !params.no_intervals} | |
| ext.when = { params.tools && params.tools.split(',').contains('haplotypecaller') && params.joint_germline && !params.no_intervals} |
| .map{ meta, cram, crai, intervals -> | ||
| [meta, cram, crai, intervals, []] | ||
|
|
||
| intervals_name = meta.num_intervals == 0 ? "no_interval" : intervals.simpleName |
There was a problem hiding this comment.
here we also need a conditional for joint germline. This addition of meta can only happen for haplotypecaller + joint germline or we need to rewrite a bunch of logic for the single sample case
|
|
||
| //If no interval file provided (0) then add empty list | ||
| intervals_new = num_intervals == 0 ? [] : intervals | ||
| intervals_new = num_intervals == 0 ? [] : intervals |
There was a problem hiding this comment.
| intervals_new = num_intervals == 0 ? [] : intervals | |
| intervals_new = num_intervals == 0 ? [] : intervals |
| //Merge scatter/gather vcfs & index | ||
| //Rework meta for variantscalled.csv and annotation tools | ||
| MERGE_GENOTYPEGVCFS(vcfs_sorted_input.intervals.map{meta, vcf -> | ||
| [[id: "joint_variant_calling", patient: "all_samples", variantcaller: "haplotypecaller", num_intervals: meta.num_intervals], vcf] |
There was a problem hiding this comment.
can you do the same nice formatting here as you did in line 116 ff?
| [meta, cram, crai, intervals, []] | ||
|
|
||
| intervals_name = meta.num_intervals == 0 ? "no_interval" : intervals.simpleName | ||
| new_meta = [patient:meta.patient, sample:meta.sample, sex:meta.sex, status:meta.status, id:meta.sample, data_type:meta.data_type, num_intervals:meta.num_intervals, intervals_name:intervals_name] |
There was a problem hiding this comment.
| new_meta = [patient:meta.patient, sample:meta.sample, sex:meta.sex, status:meta.status, id:meta.sample, data_type:meta.data_type, num_intervals:meta.num_intervals, intervals_name:intervals_name] | |
| new_meta = [ | |
| data_type:meta.data_type, | |
| id:meta.sample, | |
| intervals_name:intervals_name, | |
| num_intervals:meta.num_intervals, | |
| patient:meta.patient, | |
| sample:meta.sample, | |
| sex:meta.sex, | |
| status:meta.status | |
| ] |
|
|
||
| versions = ch_versions // channel: [ versions.yml ] | ||
| versions = ch_versions // channel: [ versions.yml ] | ||
| genotype_vcf = Channel.empty().mix(vcfs_sorted_input.no_intervals, |
There was a problem hiding this comment.
the vcfs_sorted_input.no_intervals also needs the variantcaller: "haplotypecaller" in it smeta map here to make sure annotation is placing it in the proper folder
| //Merge scatter/gather vcfs & index | ||
| //Rework meta for variantscalled.csv and annotation tools | ||
| MERGE_GENOTYPEGVCFS(vcfs_sorted_input.intervals.map{meta, vcf -> | ||
| [[id: "joint_variant_calling", patient: "all_samples", variantcaller: "haplotypecaller", num_intervals: meta.num_intervals], vcf] |
There was a problem hiding this comment.
| [[id: "joint_variant_calling", patient: "all_samples", variantcaller: "haplotypecaller", num_intervals: meta.num_intervals], vcf] | |
| [[ | |
| id: "joint_variant_calling", | |
| num_intervals: meta.num_intervals, | |
| patient: "all_samples", | |
| variantcaller: "haplotypecaller" | |
| ], vcf] |
Co-authored-by: FriederikeHanssen <Friederike.hanssen@qbic.uni-tuebingen.de>
Co-authored-by: Maxime U. Garcia <maxime.garcia@scilifelab.se>
Co-authored-by: FriederikeHanssen <Friederike.hanssen@qbic.uni-tuebingen.de>
Co-authored-by: FriederikeHanssen <Friederike.hanssen@qbic.uni-tuebingen.de>
Co-authored-by: Maxime U. Garcia <maxime.garcia@scilifelab.se>
PR checklist
scrape_software_versions.pynf-core lint .).nextflow run . -profile test,docker).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).